Hands on BgeeCall package

Bgee database: Bgee is a database to retrieve and compare gene expression patterns in multiple animal species and produced from multiple data types (RNA-Seq, Affymetrix, in situ hybridization, and EST data). It notably integrates RNA-Seq libraries for 29 species.

Reference intergenic regions: Reference intergenic regions are defined in the Bgee RNA-Seq pipeline. Candidate intergenic regions are defined using gene annotation data. For each species, over all available libraries, reads are mapped to these intergenic regions with kallisto, as well as to genes. This “intergenic expression” is deconvoluted to distinguish reference intergenic from non annotated genes, which have higher expression. Reference intergenic regions are then defined as intergenic regions with low expression level over all RNA-Seq libraries, relative to genes. This step allows not to consider regions wrongly considered as intergenic because of potential gene annotation quality problem as intergenic.

Threshold of present/absent: By default BgeeCall calculate a pValue to define calls. By default genes are consider present if the pValue is lower or equal to 0.05.

Get transcriptome and annotation files

  1. Extract transcriptome and annotation files for Drosophila melanogaster from the Ensembl database using the download.file() function (Note: use the following links to download annotation and cdna file: http://ftp.ensemblgenomes.org/pub/release-51/metazoa/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.32.51.chr.gtf.gz & http://ftp.ensemblgenomes.org/pub/release-51/metazoa/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.32.cdna.all.fa.gz).
# annotation<-download.file(url='http://ftp.ensemblgenomes.org/pub/release-51/metazoa/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.32.51.chr.gtf.gz', destfile='annotation/Drosophila_melanogaster.BDGP6.32.51.chr.gtf.gz', method='curl')
# 
# cdna<-download.file(url='http://ftp.ensemblgenomes.org/pub/release-51/metazoa/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.32.cdna.all.fa.gz', destfile='annotation/Drosophila_melanogaster.BDGP6.32.cdna.all.fa.gz', method='curl')

Retrieve intergenic information

  1. List all intergenic releases available in BgeeCall. How many exist?

  2. Verify which species are available for the current Bgee intergenic release. How many exist?

  3. Verify which species belong to the community. How many exist?

# Install the package
# if (!requireNamespace("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# BiocManager::install("BgeeCall")

# install.packages("devtools")
# library(devtools)
# install_github("BgeeDB/BgeeCall")

# Load the package
library(BgeeCall)

list_intergenic_release()
##     release releaseDate                               FTPURL
## 1       1.0  2021-06-11 https://bgee.org/ftp/intergenic/1.0/
## 2       0.2  2019-02-07 https://bgee.org/ftp/intergenic/0.2/
## 3       0.1  2018-12-21 https://bgee.org/ftp/intergenic/0.1/
## 4 community  2019-07-22                                     
## 5    custom  2019-07-22                                     
##                                                      referenceIntergenicFastaURL
## 1 https://bgee.org/ftp/intergenic/1.0/ref_intergenic/SPECIES_ID_intergenic.fa.gz
## 2 https://bgee.org/ftp/intergenic/0.2/ref_intergenic/SPECIES_ID_intergenic.fa.gz
## 3 https://bgee.org/ftp/intergenic/0.1/ref_intergenic/SPECIES_ID_intergenic.fa.gz
## 4                                                                               
## 5                                                                               
##   minimumVersionBgeeCall
## 1                  0.9.9
## 2                  0.9.9
## 3                  0.9.9
## 4                  1.1.0
## 5                  1.1.0
##                                                                                                                                                                                                     description
## 1                                                                                                                                                                  intergenic regions used to generate Bgee 15.
## 2                                                                          cleaned intergenic sequences based on release 0.1 (remove blocks of Ns longer than 100 and sequences containing more than 5% of Ns).
## 3                                                                                                                                                                  intergenic regions used to generate Bgee 14.
## 4                                                                           Release allowing to access to all reference intergenic sequences generated by the community and not present in Bgee for the moment.
## 5 Release allowing to use your own FASTA reference intergenic sequences. When this release is selected BgeeCall will use the sequences at UserMetadata@custom_intergenic_path to generate present/absent calls.
##                                                                            messageToUsers
## 1                                                                                        
## 2                         be careful, this intergenic release has not been tested by Bgee
## 3                                                                                        
## 4 These reference intergenic sequences have not been generated by Bgee. Use with caution.
## 5                              You decided to use your own reference intergenic sequences
# create BgeeMetadata object and define one reference intergenic release
bgee <- new("BgeeMetadata", intergenic_release = "1.0")
#> Querying Bgee to get intergenic release information..

# List all species for which Bgee reference intergenic sequences
list_bgee_ref_intergenic_species(myBgeeMetadata = bgee)
##    speciesId               speciesName numberOfLibraries
## 1       9606              Homo sapiens              5984
## 2       9913                Bos taurus              2774
## 3       9555              Papio anubis               814
## 4       9103       Meleagris gallopavo               594
## 5      10090              Mus musculus               566
## 6       9823                Sus scrofa               528
## 7       9940                Ovis aries               434
## 8      60711       Chlorocebus sabaeus               409
## 9       8090           Oryzias latipes               333
## 10     10141           Cavia porcellus               284
## 11     10181     Heterocephalus glaber               274
## 12     69293    Gasterosteus aculeatus               274
## 13      9544            Macaca mulatta               264
## 14      8364        Xenopus tropicalis               259
## 15      7227   Drosophila melanogaster               257
## 16      9598           Pan troglodytes               250
## 17      9796            Equus caballus               248
## 18    105023    Nothobranchius furzeri               165
## 19      9615    Canis lupus familiaris               162
## 20      7955               Danio rerio               161
## 21     10116         Rattus norvegicus               116
## 22     13616     Monodelphis domestica               115
## 23      9986     Oryctolagus cuniculus               104
## 24      9031             Gallus gallus                84
## 25      7994        Astyanax mexicanus                64
## 26      9925              Capra hircus                64
## 27      8355            Xenopus laevis                57
## 28      8049              Gadus morhua                57
## 29      7740 Branchiostoma lanceolatum                52
## 30      8081       Poecilia reticulata                45
## 31      6239    Caenorhabditis elegans                41
## 32      8154  Astatotilapia calliptera                38
## 33      9541       Macaca fascicularis                37
## 34     52904      Scophthalmus maximus                36
## 35      7936         Anguilla anguilla                36
## 36      9685               Felis catus                34
## 37      8030               Salmo salar                32
## 38     32507  Neolamprologus brichardi                32
## 39     28377       Anolis carolinensis                31
## 40      8010               Esox lucius                24
## 41      7918      Lepisosteus oculatus                21
## 42      9258  Ornithorhynchus anatinus                21
## 43      9545         Macaca nemestrina                19
## 44      9531           Cercocebus atys                18
## 45     30608        Microcebus murinus                18
## 46      7240       Drosophila simulans                15
## 47      9593           Gorilla gorilla                15
## 48      9483        Callithrix jacchus                14
## 49      7897       Latimeria chalumnae                14
## 50      9597              Pan paniscus                13
## 51      9974            Manis javanica                11
## 52      7237  Drosophila pseudoobscura                10
##              genomeVersion
## 1               GRCh38.p13
## 2               ARS-UCD1.2
## 3                 Panu_3.0
## 4              Turkey_2.01
## 5                GRCm38.p6
## 6              Sscrofa11.1
## 7                 Oar_v3.1
## 8                ChlSab1.1
## 9              ASM223467v1
## 10               Cavpor3.0
## 11       HetGla_female_1.0
## 12                BROAD S1
## 13                 Mmul_10
## 14 Xenopus_tropicalis_v9.1
## 15                BDGP6.28
## 16             Pan_tro_3.0
## 17               EquCab3.0
## 18            Nfu_20140520
## 19               CanFam3.1
## 20                  GRCz11
## 21                Rnor_6.0
## 22                ASM229v1
## 23               OryCun2.0
## 24                  GRCg6a
## 25  Astyanax_mexicanus-2.0
## 26                    ARS1
## 27                 xenLae2
## 28                 gadMor1
## 29                 BraLan2
## 30     Guppy_female_1.0_MT
## 31                WBcel235
## 32              fAstCal1.2
## 33 Macaca_fascicularis_5.0
## 34             ASM318616v1
## 35            fAngAng1.pri
## 36         Felis_catus_9.0
## 37               ICSASG_v2
## 38               NeoBri1.0
## 39               AnoCar2.0
## 40                 Eluc_v4
## 41                 LepOcu1
## 42           mOrnAna1.p.v1
## 43                Mnem_1.0
## 44                Caty_1.0
## 45                Mmur_3.0
## 46              ASM75419v3
## 47                 gorGor4
## 48             ASM275486v1
## 49                 LatCha1
## 50               panpan1.1
## 51          YNU_ManJav_2.0
## 52                Dpse_3.0
# Number of available species in Bgee 1.0
nrow(list_bgee_ref_intergenic_species(myBgeeMetadata = bgee))
## [1] 52
# Community reference intergenic
list_community_ref_intergenic_species()
##   speciesId numberOfLibraries annotationVersion genomeVersion kallistoVersion
## 1     10036                15         MesAur1.0     MesAur1.0          0.46.0
## 2     13686               243            Si_gnG        Si_gnG          0.44.0
##                                                                                      url
## 1 https://zenodo.org/api/files/f46c7de0-d9a5-4ffd-a30e-4b08121ba446/ref_intergenic.fa.gz
## 2 https://zenodo.org/api/files/5492ff2f-91a3-4101-8d67-78b8f8625cc6/ref_intergenic.fa.gz

Use BgeeCall to download the pseudo-alignment software

  1. Create an object of the KallistoMetadata class.

  2. If you don’t have Kallisto software installed on your computer, specify the argument download_kallisto = TRUE, otherwise leave download_kallisto attribute by default FALSE.

  1. if you don’t know whether you have Kallisto installed just check that by typing the following command in the terminal: kallisto version
kallisto <- new("KallistoMetadata", download_kallisto = F)
# calls_output <- generate_calls_workflow(myAbundanceMetadata = kallisto, userMetadata = user_BgeeCall)

Run analysis: Drosophila melanogaster 1 sample

  1. Create a userMetadata object (note that you have to specify in the argument species_id the Taxonomy ID, you can verify that in https://bgee.org/ in the See species information).

Species: Drosophila melanogaster (fruit fly) Scientific name: Drosophila melanogaster Common name: fruit fly Species ID: 7227 Genome source: Ensembl Genome version: BDGP6.28

To start, we need: >- a transcriptome >- gene annotations >- your RNA-Seq reads in fastq files

  1. What happens if the argument reads_size is not specified by you when you create the new userMetadata object? What can be the impact in general? Reads size of RNA-Seq libraries can be found in SRA (e.g https://www.ncbi.nlm.nih.gov/sra/?term=SRX109278)
  • Read depth varies depending on the goals of the RNA-Seq study. Most experiments require 5–200 million reads per sample, depending on organism complexity and size, along with project aims.
  1. Specify by using the following functions setRNASeqLibPath(), setTranscriptomeFromFile(), setAnnotationFromFile(), setOutputDir() and setWorkingPath() the path to your library SRX109278, transcriptome file, annotation file as well as the output and working directory.

  2. Generate the present and absent calls for the library SRX109278 by using generate_calls_workflow(). Which type of information is provided in the output files?

library(BgeeCall)
# library(AnnotationHub)
# ah <- AnnotationHub()
# ah_resources <- query(ah, c('Ensembl', 'Drosophila melanogaster','32'))
# annotation_object <- ah_resources[["AH89791"]] # Drosophila_melanogaster.BDGP6.32.103.gtf


# remove MtDNA not tag as Drosophila_melanogaster genome
# annotation_object <- dropSeqlevels(annotation_object, "MtDNA", "coarse")
# transcriptome_object <- rtracklayer::import.2bit(ah_resources[["AH90689"]])# Drosophila_melanogaster.BDGP6.32.cdna.all.2bit 

# create an object of class UserMetadata and specify the species ID
user_BgeeCall <- new("UserMetadata", species_id = "7227")

# import annotation and transcriptome in the user_BgeeCall object
# it is possible to import them using an S4 object (GRanges, DNAStringSet) or a file (gtf, fasta)
user_BgeeCall <- setAnnotationFromFile(user_BgeeCall, "/Users/minooashtiani/Desktop/UNIL.task/annotation/Drosophila_melanogaster.BDGP6.32.51.chr.gtf.gz", "Drosophila_melanogaster.BDGP6.32.51")

user_BgeeCall <- setTranscriptomeFromFile(user_BgeeCall, "/Users/minooashtiani/Desktop/UNIL.task/annotation/Drosophila_melanogaster.BDGP6.32.cdna.all.fa.gz", "Drosophila_melanogaster.BDGP6.32.51")

# user <- new("UserMetadata")
user_BgeeCall <- setWorkingPath(user_BgeeCall, "/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX109278")
user_BgeeCall <- setOutputDir(user_BgeeCall, "/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX109278")

# provide path to the directory of your RNA-Seq library
user_BgeeCall <- setRNASeqLibPath(user_BgeeCall, "/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX109278")

calls_output <- generate_calls_workflow(userMetadata = user_BgeeCall)

head(read.table(calls_output$calls_tsv_path, header = TRUE), n = 5)
##            id  abundance   counts   length        biotype  type     zScore
## 1 FBgn0000008 1.97295149  45.0000 4010.677 protein_coding genic  2.1892794
## 2 FBgn0000014 3.25513733  70.0000 3781.380 protein_coding genic  2.5491954
## 3 FBgn0000015 0.06412318   3.0000 8226.740 protein_coding genic -0.2737588
## 4 FBgn0000017 4.47344731 178.1064 7000.980 protein_coding genic  2.7777282
## 5 FBgn0000018 3.67799276  34.0000 1625.510 protein_coding genic  2.6369868
##        pValue    call
## 1 0.014288269 present
## 2 0.005398589 present
## 3 0.607865022  absent
## 4 0.002737020 present
## 5 0.004182305 present
  1. Plot the frequency of p-values for the correspondent library.
df<-read.table(calls_output$calls_tsv_path, header = TRUE)
library(ggpubr)
p <- ggboxplot(df, x = "call", y = "pValue",
          color = "call", palette = "jco",
          add = "jitter")
#  Add p-value
p

hist(df$pValue, freq = FALSE, col="green", breaks = 20 )

Run analysis: multiple Drosophila melanogaster samples

  1. Create a user input file describing all RNA-Seq libraries previously downloaded, see https://github.com/BgeeDB/BgeeCall/blob/develop/inst/userMetadataTemplate.tsv and the vignette of the package for more information
userMetadataTemplate<-tibble(species_id=rep(7227, 6),   run_ids=rep("-",6), reads_size=rep(x = 52,6),   rnaseq_lib_path=c("/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX109278","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX109279","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX493950","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX493999","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX1720957","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX1720958"), transcriptome_path=rep("/Users/minooashtiani/Desktop/UNIL.task/annotation/Drosophila_melanogaster.BDGP6.32.cdna.all.fa.gz",6),  annotation_path=rep("/Users/minooashtiani/Desktop/UNIL.task/annotation/Drosophila_melanogaster.BDGP6.32.51.chr.gtf.gz",6),  output_directory=c("/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX109278","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX109279","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX493950","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX493999","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX1720957","/Users/minooashtiani/Desktop/UNIL.task/bgeecall_exercise/SRX1720958"),devStage=c("day4adult:FBdv:00007079","day4adult:FBdv:00007079","day4adult:FBdv:00007079","day4adult:FBdv:00007079","fullyFormed:UBERON:0000066","fullyFormed:UBERON:0000066"))

userMetadataTemplate
## # A tibble: 6 × 8
##   species_id run_ids reads_size rnaseq_lib_path  trans…¹ annot…² outpu…³ devSt…⁴
##        <dbl> <chr>        <dbl> <chr>            <chr>   <chr>   <chr>   <chr>  
## 1       7227 -               52 /Users/minooash… /Users… /Users… /Users… day4ad…
## 2       7227 -               52 /Users/minooash… /Users… /Users… /Users… day4ad…
## 3       7227 -               52 /Users/minooash… /Users… /Users… /Users… day4ad…
## 4       7227 -               52 /Users/minooash… /Users… /Users… /Users… day4ad…
## 5       7227 -               52 /Users/minooash… /Users… /Users… /Users… fullyF…
## 6       7227 -               52 /Users/minooash… /Users… /Users… /Users… fullyF…
## # … with abbreviated variable names ¹​transcriptome_path, ²​annotation_path,
## #   ³​output_directory, ⁴​devStage
# write.table(userMetadataTemplate, file = "/Users/minooashtiani/Desktop/UNIL.task/userMetadataTemplate.tsv", row.names=FALSE, sep="\t")
  1. Run the generation of present and absent calls from the user file with default values for all .
calls_output <- generate_calls_workflow(userFile = "/Users/minooashtiani/Desktop/UNIL.task/userMetadataTemplate.tsv")
  1. Combine multiple libraries per species using the merging_libraries() function. What is the proportion of genes present?
 merging_libraries(userFile = "/Users/minooashtiani/Desktop/UNIL.task/userMetadataTemplate.tsv", approach = "BH", condition = "species_id", cutoff = 0.01, outDir = "/Users/minooashtiani/Desktop/UNIL.task")
  1. Modify the input file to combine libraries per species (species_id) and developmental stage (devStage), see the structure of the file here: https://github.com/BgeeDB/BgeeCall/blob/develop/inst/userMetadataTemplate_merging.tsv developmental stages of libraries :
  • fully formed stage (ID : UBERON:0000066) for libraries SRX1720957 and SRX1720958
  • day 4 of adulthood (ID : FBdv:00007079) for libraries SRX493950, SRX493999, SRX109278 and SRX109279
merging_libraries(userFile = "/Users/minooashtiani/Desktop/UNIL.task/userMetadataTemplate.tsv", approach = "BH", condition = c("species_id", "devStage"), cutoff = 0.01, outDir =  "/Users/minooashtiani/Desktop/UNIL.task")
  1. Generate the present and absent calls with a more restrictive p-value = 0.01
# use intergenic approach with cutoff ratio 0.01
kallisto <- new("KallistoMetadata", cutoff_type = "intergenic", cutoff = 0.01)

calls_output <- generate_calls_workflow(userFile = "/Users/minooashtiani/Desktop/UNIL.task/userMetadataTemplate.tsv", abundanceMetadata = kallisto)
  1. Get summary stats of all libraries by using get_summary_stats() function.
BgeeCall::get_summary_stats(userFile = "/Users/minooashtiani/Desktop/UNIL.task/userMetadataTemplate.tsv", outDir = "/Users/minooashtiani/Desktop/UNIL.task")
  1. Plot the proportion of protein coding genes of all libraries for each p-value cut-off.
df<-read.table("/Users/minooashtiani/Desktop/UNIL.task/summary_Stats_All_Libraries.tsv", header = TRUE)
df$cutoff<-as.character(df$cutoff)

library(ggpubr)
p <- ggboxplot(df, x = "libraryId", y = "proportionCodingPresent",
          color = "cutoff", palette = "jco",
          add = "jitter")
#  Add p-value
p

Downstream analysis

The aim of this part is to show you that you can go from BgeeCall results to forward analysis.

  1. Perform a differential expression analysis between different developmental stage conditions. (Note: that in the provided dataset we have 4 samples from FBdv:00007079 and 2 samples from UBERON:0000066, so you can select just 2 samples from FBdv:00007079 (like: SRX109278 and SRX109279) to balance the analysis. Note that statistically it is recommended to use at least 3 samples of each condition for differential expression analysis).
library(tidyverse)
library("DESeq2")

colData<- DataFrame(row.names = c(   "SRX109278",
             "SRX109279",
             "SRX1720957",
             "SRX1720958"),
                    condition=c("day4adult:FBdv:00007079","day4adult:FBdv:00007079","fullyFormed:UBERON:0000066","fullyFormed:UBERON:0000066"))

colData$condition<-as.factor(colData$condition)
colData
## DataFrame with 4 rows and 1 column
##                             condition
##                              <factor>
## SRX109278  day4adult:FBdv:00007079   
## SRX109279  day4adult:FBdv:00007079   
## SRX1720957 fullyFormed:UBERON:0000066
## SRX1720958 fullyFormed:UBERON:0000066
# coldata <- colData(gse)
colData$condition <- relevel(colData$condition , "day4adult:FBdv:00007079")

df<-read.table(calls_output[[1]]$calls_tsv_path, header = TRUE)$counts
# df

countdf<-tibble(ID=read.table(calls_output[[1]]$calls_tsv_path, header = TRUE)$id,
  "SRX109278"=read.table(calls_output[[1]]$calls_tsv_path, header = TRUE)$counts,
               "SRX109279"=read.table(calls_output[[2]]$calls_tsv_path, header = TRUE)$counts,
               "SRX1720957"=read.table(calls_output[[5]]$calls_tsv_path, header = TRUE)$counts,
             "SRX1720958"=read.table(calls_output[[6]]$calls_tsv_path, header = TRUE)$counts)
# head(countdf)

library(tidyverse)
countdf<-countdf %>% remove_rownames %>% column_to_rownames(var="ID")

countdf1<-countdf%>% mutate_all(as.integer)

head(countdf1)
##             SRX109278 SRX109279 SRX1720957 SRX1720958
## FBgn0000008        45        50         61         54
## FBgn0000014        70       299        358        396
## FBgn0000015         3        26         12          7
## FBgn0000017       178       307        798        735
## FBgn0000018        34        41        378        329
## FBgn0000022         0         0          0          0
cstm<- countdf1

#column sums of the count
colSums(cstm)
##  SRX109278  SRX109279 SRX1720957 SRX1720958 
##    5342676    4386098   14569868   16077526
# Normalization using DESeq2 (size factors)
# biocLite("DESeq2")
library(DESeq2)
# dds <- DESeqDataSetFromMatrix(countData = cstm,
#                          colData = colData,design = ~ condition )
dds <- DESeqDataSetFromMatrix(countData = cstm,
                         colData = colData,design = ~ condition )
# names(assays(dds))

# minimal pre-filtering to keep only rows that have at least 10 reads total.
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
rm(keep)

#Estimate size factors: library size estimators 
dds <- estimateSizeFactors( dds )

## the rlog transform
library("dplyr")
library("ggplot2")
rld <- rlog(dds, blind = FALSE)
# ## Assessment of Overall Similarity between Samples
sampleDists <- dist( t( assay(rld) ) )

library("pheatmap")
library("RColorBrewer")
library(grid)

sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- paste( rld$condition, sep="-" )
colnames(sampleDistMatrix) <- paste( rld$condition, sep="-" )
colors <- colorRampPalette( rev(brewer.pal(9, "Greens")) )(255)

## Edit body of pheatmap:::draw_colnames, customizing it to your liking
draw_colnames_45 <- function (coln, ...) {
    m = length(coln)
    x = (1:m)/m - 1/2/m
    grid.text(coln, x = x, y = unit(0.96, "npc"), vjust = .5, 
        hjust = 1, rot = 45, gp = gpar(...)) ## Was 'hjust=0' and 'rot=270'
}

## For pheatmap_1.0.8 and later:
draw_colnames_45 <- function (coln, gaps, ...) {
    coord = pheatmap:::find_coordinates(length(coln), gaps)
    x = coord$coord - 0.5 * coord$size
    res = textGrob(coln, x = x, y = unit(1, "npc") - unit(3,"bigpts"), vjust = 0.5, hjust = 1, rot = 45, gp = gpar(...))
    return(res)}

## 'Overwrite' default draw_colnames with your own version 
assignInNamespace(x="draw_colnames", value="draw_colnames_45",
ns=asNamespace("pheatmap"))

pheatmap(sampleDistMatrix,
         clustering_distance_rows=sampleDists,
         clustering_distance_cols=sampleDists,
         col=colors)

Top genes : sort by log2FoldChange

## Differential Gene Expression Analysis
dds$condition <- relevel(dds$condition, "day4adult:FBdv:00007079")

dds <- DESeq(dds)
res <- results(dds)

dfres<-as.data.frame(res)

## Summary of results

summary(res)
## 
## out of 12199 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 1580, 13%
## LFC < 0 (down)     : 1949, 16%
## outliers [1]       : 0, 0%
## low counts [2]     : 237, 1.9%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
# #Top genes : sort by log2FoldChange
resSort <- res[order(res$log2FoldChange),]
head(resSort)
## log2 fold change (MLE): condition fullyFormed.UBERON.0000066 vs day4adult.FBdv.00007079 
## Wald test p-value: condition fullyFormed.UBERON.0000066 vs day4adult.FBdv.00007079 
## DataFrame with 6 rows and 6 columns
##              baseMean log2FoldChange     lfcSE      stat      pvalue
##             <numeric>      <numeric> <numeric> <numeric>   <numeric>
## FBgn0003356  1347.591       -14.6098   2.44843  -5.96701 2.41635e-09
## FBgn0011669  1277.741       -14.5330   2.71732  -5.34829 8.87867e-08
## FBgn0031617   899.550       -14.0265   1.57780  -8.88988 6.11729e-19
## FBgn0003357  2218.527       -13.8876   1.54992  -8.96017 3.24175e-19
## FBgn0004045  3295.669       -13.4620   1.21187 -11.10847 1.14099e-28
## FBgn0043825   577.431       -13.3874   1.50660  -8.88582 6.34488e-19
##                    padj
##               <numeric>
## FBgn0003356 7.24421e-08
## FBgn0011669 1.90335e-06
## FBgn0031617 9.38141e-17
## FBgn0003357 5.46167e-17
## FBgn0004045 5.33404e-26
## FBgn0043825 9.60728e-17
  1. Filter the results by providing just genes with FDR < 0.01. Provide a visualization graphic as MA plot.
# MA plot
res.noshr <- results(dds, name="condition_fullyFormed.UBERON.0000066_vs_day4adult.FBdv.00007079")
plotMA(res.noshr, ylim = c(-5,5))

## Volcano plot
library("EnhancedVolcano")

# The main function is named after the package
EnhancedVolcano(toptable = res,              # We use the shrunken log2 fold change as noise associated with low count genes is removed 
                x = "log2FoldChange",           # Name of the column in resLFC that contains the log2 fold changes
                y = "padj",                     # Name of the column in resLFC that contains the p-value
                lab = rownames(res),pointSize = 2,
  labSize = 5,  legendLabSize = 5,
  legendIconSize = 5,  encircleSize = 1,xlim = c(-5,5))

  1. Make a GO analysis. Which type of information do you retrieve?

GO for downregulated genes

## AnnotationHub with 1 record
## # snapshotDate(): 2020-10-27
## # names(): AH84121
## # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
## # $species: Drosophila melanogaster
## # $rdataclass: OrgDb
## # $rdatadateadded: 2020-10-20
## # $title: org.Dm.eg.db.sqlite
## # $description: NCBI gene ID based annotations about Drosophila melanogaster
## # $taxonomyid: 7227
## # $genome: NCBI genomes
## # $sourcetype: NCBI/ensembl
## # $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p...
## # $sourcesize: NA
## # $tags: c("NCBI", "Gene", "Annotation") 
## # retrieve record with 'object[["AH84121"]]'

GO for upregulated genes

## 
## 
## Which column contains my new Entrez IDs?
##    SYMBOL ENTREZID
## 1    Rhau    35339
## 2  unc-13    43841
## 3    Gale    38076
## 4 CG34195    37018
## 5  CG3491    34870
## 6     hts    37230